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ESTIMATING CORRELATION FROM HIGH, LOW, OPENING 
AND CLOSING PRICES 

By L. C. G. Rogers and Fanyin Zhou 

University of Cambridge and Imperial College London 

In earlier studies, the estimation of the volatility of a stock us- 
ing information on the daily opening, closing, high and low prices 
has been developed; the additional information in the high and low 
prices can be incorporated to produce unbiased (or near-unbiased) 
estimators with substantially lower variance than the simple open- 
close estimator. This paper tackles the more difficult task of estimat- 
ing the correlation of two stocks based on the daily opening, closing, 
high and low prices of each. If we had access to the high and low val- 
ues of some linear combination of the two log prices, then we could 
use the univariate results via polarization, but this is not data that 
is available. The actual problem is more challenging; we present an 
unbiased estimator which halves the variance. 

1. Introduction. There is no doubt that volatihty is a central concept 
in the theory and application of quantitative finance. In our simplest mod- 
els, we treat volatility as a constant of the Black-Scholes paradigm, but we 
quickly discover that the resulting option pricing formula does not fit reality 
very well, so we consider variants of the basic model, for example, models 
where the volatility is allowed to be stochastic in some way. (The enormous 
literature on GARCH models aims to address similar issues, but cannot be 
viewed as a variant of Black-Scholes, being as it is a firmly discrete-time 
theory.) It is not our purpose here to survey this huge field; the reader may 
consult Ghysels, Harvey and Renault (1996), Shephard (2005) for a survey 
of (some of) what is known on stochastic volatility. Having chosen a par- 
ticular model for volatility, the question of estimating it now arises. Again, 
there is no shortage of papers which propose methods of doing just this; see 
the survey Broto and Ruiz (2004) for further references. How this estima- 
tion is to be carried out depends on the nature of the data available and 
the model to be estimated. For example, if high-frequency data is available. 
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then we may attempt to estimate volatility through the realized variance 
of the path. There are several reasons why this is not necessarily a good 
idea. First, as Alizadeh, Brandt and Diebold (2002) argue, microstructure 
effects such as bid-ask bounce can significantly bias the estimator upward, 
though this problem can be obviated to a large extent by a more inge- 
nious choice of estimator; see, for example, Barndorff-Nielsen and Shephard 
(2004), Zhang, Mykland and Ai't-Sahalia (2005). Second, we should expect 
that the estimates made will not show much intertemporal stability (in view 
of the well-known profile of intraday trading activity). Indeed, the recent 
work of Barndorff-Nielsen et al. (2007) confirms this, showing estimates of 
volatility which vary very substantially from day to day. Third, we have to 
handle a huge amount of data; while this is not in itself a problem, it is 
reasonable to ask whether the effort (human and computer) is worth the 
goal and, indeed, whether the additional effort will actually help toward the 
goal. Much depends on the intended use, but if we want to price options, 
or make forecasts, a few months into the future, then we should be using 
calibration data sampled on a comparable time scale and will require es- 
timates of volatility; studies of high-frequency realized volatility are not so 
much estimating volatility as measuring it. 

In this study, we shall suppose that we are interested in estimating volatil- 
ity and covariances for the purposes of derivative pricing, derivative hedging 
and forecasting. For the reasons just outlined, we propose to restrict our 
attention to daily price data, for lack of convincing evidence that high- 
frequency observation helps to this goal. We shall also discuss only the es- 
timation of constant volatilities and covariances; if nothing can be done 
in this simple situation, then nothing can be done in the more general set- 
ting. The strand of the literature that we develop in this paper is that 
of range-based estimation of volatility. The idea of using information on 
the daily high and low prices, as well as the opening and closing prices, 
goes back a long way, to Parkinson (1980) and Garman and Klass (1980) at 
least, with further contributions by Beckers (1983), Ball and Torous (1984), 
Rogers and Satchell (1991), Kunitomo (1992), Yang and Zhang (2000) and 
Alizadeh, Brandt and Diebold (2002), among others. However, it is only 
comparatively recently that attention has been given to range-based estima- 
tion of covariance between different assets; see, for example, Brunetti and Lildholdt 
(2002), Brandt and Diebold (2006). 

The covariance of assets is important for the computation of the prices 
of derivatives written on many under lyings, such as basket options; the ob- 
vious method of estimation (treating the daily log-returns as i.i.d. multi- 
variate Gaussian variables) produces an unbiased estimator of the covari- 
ance matrix. The question we address in this paper is "Can information 
on daily high and low prices be used to make better (i.e., lower mean 
squared error) unbiased estimates of the covariance matrix?" The studies 
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Brandt and Diebold (2006), Brunetti and Lildholdt (2002) work with for- 
eign exchange data, where the availabihty of data on the cross rates means 
that one is able to observe highs and lows of linear combinations of the log 
asset prices, allowing one to reduce to existing univariate methodology by 
polarization. However, such an approach would be impossible if assets were 
equities, say, since we do not have information on the highs and lows of lin- 
ear combinations of the log asset prices (unless full tick data is available, but 
this would be a very different question). For such situations, a completely 
new approach is required; this is what we undertake in this paper. 

In Section 2, we shall, without loss of generality, restrict to the situation 
of two correlated log-Brownian assets, whose rates of growth we shall assume 
are both zero. This assumption, used by various authors, is quite innocent 
if the data is being sampled daily, as the growth rate is negligible in com- 
parison with the fluctuations. We aim to construct an unbiased estimator 
which is a quadratic function of the high, low and closing (log-)price of the 
two assets, and which has smallest MSE. For correlation p = —1,0,1, the 
various moments we require are known in closed form, but for other values 
of p, not all were known. [The recent paper Rogers and Shepp (2006) fills in 
the missing answers.] What we do is to search among linear combinations 
of quadratic functions of the variables (subject to the constraint that the 
estimator has no bias if p= —1, 0, 1) for the estimator that has the smallest 
MSE when p = 0. This produces a new estimator whose variance is half that 
of the obvious estimator based solely on closing prices. We present simula- 
tion evidence that this advantage appears to be preserved for other values of 
p and is partly robust to departures from Gaussian returns. The form of the 
estimator is, moreover, insensitive to errors produced by discrete sampling of 
the underlying Brownian motions, a problem encountered with some other 
range-based estimators. 

2. Estimating covariance. We suppose that the log price processes Xi{t), 
i = 1, . . . ,n, are correlated Brownian motions, that is, 

E[Xi{s)Xj{t)] = aijmm{s,t} 

for all i,j. We write 

Hi= max Xiit), L,- = min XJt), = Xi(l) 

0<t<l ■' ■' 0<i<l ■' 

for the high, low and final log price, respectively, over a fixed time interval 
which we lose no generality in supposing to be [0, 1]. We may also restrict 
our attention to the case of just two assets since we may estimate the entire 
correlation matrix if we can handle this case. 

To state the main theoretical result of the paper, we shall suppose that 
Xi and X2 are standard Brownian motions, that is, an = (J22 = 1- (We shall 
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see almost immediately that this restriction is unnecessary.) In this case, 
the only parameter of the problem to be estimated is the correlation p = ai2 
and we obtain the following result. 

Theorem 1. Among all cross-quadratic junctionals (by which we mean 
a linear combination of the terms H1H2, H1L2, L1H2, L1L2, H1S2, L1S2, 
S1H2, S1L2, S1S2) 

p = p{Hi,Li, Si, H2, 12,82) 

of the high, low and final log-prices of the two assets which satisfy the unbi- 
asedness condition 

(1) Ep[p]=p {p = -1,0,1), 

the one whose variance EolfP] is minimal when p = is 

(2) P = \SiS2 + ^^^^^(Hi + Li- Si){H2 + L2- S2). 

The constant b is equal to 2 log 2 — 1 ~ 0.386294 and the minimized variance 
is Eo[p^] = 1/2. 

Remark. It is now obvious from Theorem 1, by a simple scaling, that 
for general aij, the estimator 

(3) ai2 = ^SiS2 + ]_ 2b) + ^1 " ^^^^^2 + L2- S2) 

is unbiased for ai2 when p = —1, 0, 1, and when p = 0, minimizes, variance. 



Proof of Theorem 1. The goal is to make an unbiased estimator of 
p by forming linear combinations of the nine possible cross terms, Zhh = 
HiH2,Zhl = HiL2,Zlh = LiH2,Zll = LiL2,Zhs = HiS2,Zls = L1S2, 
ZsH = SiH2,ZsL = S1L2 and Zss = SiS2- Now, the means of these prod- 
ucts are known for the cases p = —1,0,1 and the recent paper Rogers and Shepp 
(2006) establishes that 

EZhh = f{p) 

(4) 

f°° coshz^Q 

= cosa / du — — tanhz^7, 

Jo sinhz^7r/2 

where p = sina, a G (— 7r/2, 7r/2) and 2-y = a + 7r/2. Table 1 summarizes the 
situation. We seek a linear combination p of the nine cross products with 
the following properties: 



(i) Ep[p]=ploT p= -1,0,1; 
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(ii) when p=0, the variance of p is minimal. 

In order to find a minimum- variance hnear combination, we need to know the 
covariance of Z = {Zhh,Zhl,Zlh,Zll,Zhs,Zls,Zsh,Zsl,Zss) when 
p = 0. In this case, the two Brownian motions are independent and the 
entries of the covariance matrix can be computed from the entries of Ta- 
ble 1. For example, Eq[ZhhZsl] = Ei[Zhs] " Ei[Zhi] = —b/2. Routine but 
tedious calculations lead to the following covariance matrix: 
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our object 


ive now is to 


choose a 9-vector 


w of weights 


to minimize w ■ 


Vw subject to the constraints that w 


• y = 


and 


w ■ m 


= 1. This simple 


optimization problem is 


easily solved: 


we find that the solution 


takes the 


form 
























(6) 








w = 


aV 


+ (3V 






















Table 1 


















Means 


of the components of Z 














P 


= -1 




P = 







p=l 






P 


EZhh 






b 




2/-K 






1 




fip) 


EZhl 






-1 




-2/-K 




-b 






fi-p) 


EZlh 






-1 




-2/-K 




-b 






fi-p) 


EZll 






b 




2/7r 






1 




fip) 


EZhs 






-1/2 











1/2 






p/2 


EZls 






-1/2 











1/2 






p/2 


EZsH 






-1/2 











1/2 






p/2 


EZsL 






-1/2 











1/2 






p/2 


EZss 






-1 











1 






P 



6 



L. C. G. ROGERS AND F. ZHOU 



where a, [5 are determined by 



(7) 



m-V m m - V y\ f ct \ _ f ^ 



y -V m y - V y J \ j3 J \0 



Lengthy but routine calculations lead to the final form (2), as claimed, and 
the value Eq[p'^] = 1/2 is calculated from the explicit forms of V, m and y. 
□ 

Remark, (i) It is clear that if we are trying to produce an estimate 
of the covariance matrix of more than two Brownian motions, estimating 
each entry by means of (2), then the matrix will be rank 2 and nonnegative 
definite. 

(ii) One problem identified in the earlier literature with estimators based 
on high and low values occurs when we observe the Brownian motions dis- 
cretely, at equally spaced times, say we observe H^^^ = sup{X (i/N) :i = 
0, . . . , A^} and L(^) = ml{X{i/N) : i = 0, . . . , A^}, and these substantially un- 
derestimate the supremum and overestimate the infimum. A correction is 
known to deal with this [see Broadie, Glasserman and Kou (1997)], but we 
see that as we only ever need to calculate H + L, the discretization errors 
cancel out on average because of the observation that H — H^^^ and L^^^ — L 
have the same distribution, by symmetry. 

(iii) The means in the last five lines in Table 1 are exactly linear in p, 
whereas the means in the first four are not. The function / is well approx- 
imated by a quadratic; the difference between / and its quadratic approxi- 
mation (which is exact at p = —1,0, 1) is never more than 0.65%. However, 
if we compute the mean of p, we find 



ifip) = 



PL 



- Si(if2 + L2) - S2(ffl + il) + S1S2] 

Now, if we simply replace the function / by its quadratic approximation, 
this expression collapses to p. In other words, replacing / by its quadratic 
approximation prevents us from understanding and correcting for the bias 
in the estimator p. 

What we propose to do, therefore, is the following. We suppose that we 
see data from a run of N days and on day i, we compute the value ri (say) 
of p. We then take the mean f of the rj and use as our estimator of p 



(8) PRZ = 'f'\r). 
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Table 2 

Simulation results for Brownian motion 



p 


Po 


SD(po) 


pRZ 


bD(pRz ) 


Variance ratio 


-0.9 


-0.9069 


1.367 


-0.9082 


0.8831 


2.3950 


-0.8 


-0.7930 


1.290 


-0.7950 


0.8396 


2.3600 


-0.7 


-0.7067 


1.239 


—0.7005 


0.8079 


2.3505 


-0.6 


-0.5880 


1.157 


-0.5872 


0.7678 


2.2719 


-0.5 


-0.5064 


1.137 


-0.5045 


0.7680 


2.1917 


-0.4 


-0.4030 


1.075 


-0.3962 


0.7377 


2.1252 


-0.3 


-0.2971 


1.060 


-0.2981 


0.7178 


2.1812 


-0.2 


-0.2075 


1.019 


-0.1957 


0.7056 


2.0835 


-0.1 


-0.0970 


1.003 


-0.1004 


0.7101 


1.9961 


0.0 


-0.0038 


0.999 


-0.0011 


0.7021 


2.0285 


0.1 


0.0992 


1.010 


0.0943 


0.7151 


1.9942 


0.2 


0.2083 


1.014 


0.2086 


0.7111 


2.0331 


0.3 


0.3051 


1.042 


0.3028 


0.7187 


2.1032 


0.4 


0.4089 


1.096 


0.4037 


0.7370 


2.2128 


0.5 


0.5013 


1.124 


0.5055 


0.7649 


2.1611 


0.6 


0.5967 


1.159 


0.6032 


0.7812 


2.1994 


0.7 


0.6913 


1.190 


0.6946 


0.7941 


2.2468 


0.8 


0.8062 


1.309 


0.7979 


0.8441 


2.4057 


0.9 


0.9012 


1.344 


0.9042 


0.8671 


2.4038 



Though the function if is not available in closed form, its numerical values 
can easily be computed at any desired grid of points in [—1,1] and then 
interpolated. 

3. Simulation study. We have carried out a simulation study of the esti- 
mators. For each p = —0.9, —0.8, . . . , 0.9, we generated 20,000 paths (of du- 
ration 1) of correlated standard Brownian motions, with 500 steps on each 
path, and for each path, we computed and stored the values of po = 5152 
and PRZ- The results are reported in Table 2. We give the sample means 
and standard deviations of the two estimators for each value of p and we 
also present the ratio of the sample variance of po over the sample variance 
of PRZ- We see that this ratio is always at least 2, with the smallest value 
appearing around p = 0, where theory predicts the value 2 exactly. 

We see that both estimators are close to the true values across the entire 
range of p- values chosen, but that pRz has at most half the variance of the 
simple estimator pQ. 

To check the robustness of the estimator to model assumptions, we re- 
peated the simulation study using a variance gamma (VG) process instead 
of Brownian motion, once again with 20,000 paths sampled at 500 points in 
time. The results are reported in Table 3. Probably the most striking feature 
is the fact that the estimator pRz is now very substantially biased, even for 
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Table 3 







Simulation 


results for VG 


process 




p 


Po 


SD(po) 


Phz 


SD(/5hz) 


Variance ratio 


-0.9 


-0.8969 


2.0253 


-0.6847 


1.1751 


2.9705 


-0.8 


-0.8094 


1.9726 


-0.6112 


1.1094 


3.1619 


-0.7 


-0.6681 


1.6592 


-0.525 


0.9746 


2.8982 


-0.6 


-0.6054 


1.565 


-0.4683 


0.9070 


2.9771 


—0.5 


—0.5041 


1.4674 


—0.3944 


0.8512 


2.972 


-0.4 


-0.3928 


1.228 


-0.3133 


0.7264 


2.8579 


-0.3 


-0.3017 


1.1538 


-0.2409 


0.6792 


2.8863 


-0.2 


-0.2000 


1.0383 


-0.1637 


0.6063 


2.9331 


-0.1 


-0.0854 


1.0075 


-0.0779 


0.5759 


3.0607 


0.0 


-0.0069 


0.9940 


-0.0029 


0.5445 


3.3326 


0.1 


0.0967 


0.9975 


0.0827 


0.5694 


3.0695 


0.2 


0.2057 


1.0642 


0.1660 


0.6150 


2.9949 


0.3 


0.3068 


1.1338 


0.2470 


0.6761 


2.8119 


0.4 


0.3891 


1.2734 


0.3101 


0.7514 


2.8722 


0.5 


0.4883 


1.4006 


0.3870 


0.8192 


2.9233 


0.6 


0.5999 


1.549 


0.4701 


0.9150 


2.8658 


0.7 


0.7253 


1.8293 


0.5515 


1.0414 


3.0855 


0.8 


0.8042 


1.9081 


0.6118 


1.0988 


3.0155 


0.9 


0.8941 


2.0951 


0.6807 


1.2121 


2.988 



moderately small values of p. We conclude that the use of this estimator is 
not advisable if we are not satisfied that the underlying process is Brownian 
motion. Observe that the bias is always in the direction of underestimating 
the magnitude of the correlation. 

As a further check of robustness, we performed the same simulation, but 
using a Brownian motion with drift 0.1. The results are reported in Table 
4. This time, the bias of p^z is small, but the variance advantage persists. 

4. Empirical study. In this section, we examine a small data set of stock 
prices on four stocks: Boeing (BA), GlaxoSmithKline (GSK), General Mo- 
tors (GM) and Proctor & Gamble (PG). The prices were from the NYSE, 
for the period 4th February 2002 up to 12th July 2006, a period of 1,118 
trading days. The data was from Yahoo Finance. The results are presented 
in Tables 5 and 6, and in Figure 1. Table 5 presents the point estimates 
(sample means) of the correlation computed first by the simple open-close 
estimator and second by the estimator pnz- Table 6 gives the ratio of the 
sample variances of the two estimators, the sample variance of puz being 
expressed as a percentage of the sample variance of po . We can see that the 
point estimators of the correlation are reasonably close, but noticeably dif- 
ferent in places; however, inspection of Figure 1 shows that the differences 
are well within sampling error. 
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Table 4 

Simulation results for Brownian motion with drift 0. 1 



p 


Po 


sD(po) 


Prz 




Variance ratio 


-0.9 


-0.8960 


1.3634 


-0.8898 


0.8560 


2.5372 


-0.8 


-0.7842 


1.2769 


-0.7878 


0.8267 


2.3857 


-0.7 


-0.6874 


1.2068 


—0.6917 


0.7910 


2.3277 


-0.6 


-0.5817 


1.1604 


-0.5840 


0.7659 


2.2953 


-0.5 


-0.4851 


1.1123 


-0.4895 


0.7482 


2.21 


-0.4 


-0.3953 


1.099 


-0.3961 


0.7481 


2.1582 


-0.3 


-0.2868 


1.0469 


-0.2855 


0.7196 


2.1167 


-0.2 


-0.1851 


1.0327 


-0.1929 


0.7229 


2.0407 


-0.1 


-0.0871 


1.0087 


-0.0935 


0.7120 


2.0074 


0.0 


0.0143 


0.9994 


0.0047 


0.7050 


2.0093 


0.1 


0.1104 


1.0095 


0.1091 


0.7082 


2.0319 


0.2 


0.2130 


1.0575 


0.208 


0.7196 


2.1598 


0.3 


0.3076 


1.0599 


0.3005 


0.7216 


2.1572 


0.4 


0.4088 


1.0831 


0.4045 


0.7359 


2.166 


0.5 


0.5118 


1.135 


0.5062 


0.7602 


2.2291 


0.6 


0.6241 


1.2004 


0.6108 


0.7827 


2.3523 


0.7 


0.7157 


1.2345 


0.6987 


0.7981 


2.3928 


0.8 


0.8153 


1.3177 


0.8015 


0.8371 


2.4777 


0.9 


0.9199 


1.3979 


0.9114 


0.8937 


2.4465 



The sample variance of pRz is substantially less than the sample variance 
of the simple estimator po, so we see that for this data, the theoretical 
advantage of pRz, namely its lower mean-square error, appears to hold. 

5. Conclusions. We have presented a new estimator for the correlation 
of asset prices, based on the information contained in daily high, low, open 



Table 5 
Point estimates of correlation 





BA GSK 


GM 


PG 




Estimated correlation matrix usinj 


? po 




BA 


1.0000 0.3354 


0.3294 


0.3201 


GSK 


1.0000 


0.2987 


0.3464 


GM 




1.0000 


0.2102 


PG 


Estimated correlation matrix using 


Prz 


1.0000 


BA 


1.0000 0.2948 


0.2925 


0.2562 


GSK 


1.0000 


0.2208 


0.3327 


GM 




1.0000 


0.2086 


PG 






1.0000 



10 



L. C. G. ROGERS AND F. ZHOU 



Table 6 
Ratio of sample variances 



Ratio of sample variance of puz 
to sample variance of po (in %) 





BA 


GSK 


GM 


PG 


BA 


92.43 


55.49 


45.49 


60.88 


GSK 




54.74 


45.90 


55.09 


GM 






78.02 


48.12 


PG 








54.97 



and close prices. In contrast to other studies, we have not supposed that 
the high and low prices of some linear combination of the log prices is avail- 
able. While this supposition might be reasonable if the assets were currencies 
(when the cross rates would provide the required information) , it would only 
be possible in the context of equity if high-frequency data were available. We 



0.50 



Estimates of rho. with 95% confidence intervals 




Pair 



Fig. 1. Estimates of p. Estimated values are given by solid lines (circle for simple estima- 
tor, diamond for pRz ) and the 95% confidence intervals are given by the dashed lines. The 
pairs in Figure 1 are listed m the order BA:GSK, BA:GM, GSK:GM, BA:PG, GSKiPG, 
GM:PG. 
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have found a minimum-variance unbiased estimator quadratic in the vari- 
ables and have investigated its properties. Simulation experiments showed 
that the estimator behaved as expected for log-Brownian data, but that the 
performance on simulated variance gamma data was poor. A small-scale 
study of prices of equity in major US firms showed that the two estimators 
agreed to within sampling error and that the sample variance of the new es- 
timator was considerably less. As with range-based estimation of volatility, 
we conclude that range-based estimation of correlation lacks dependable and 
decisive advantages over the simpler estimators based only on the open-close 
prices. 

Nevertheless, it seems that it is always worth computing the new estima- 
tor, if only as a comparison with the simple open-close estimator. Widely 
differing numerical values may indicate a departure from log-normality that 
requires further investigation. 
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